7 research outputs found

    Using Synthetic Data to Train Neural Networks is Model-Based Reasoning

    Full text link
    We draw a formal connection between using synthetic training data to optimize neural network parameters and approximate, Bayesian, model-based reasoning. In particular, training a neural network using synthetic data can be viewed as learning a proposal distribution generator for approximate inference in the synthetic-data generative model. We demonstrate this connection in a recognition task where we develop a novel Captcha-breaking architecture and train it using synthetic data, demonstrating both state-of-the-art performance and a way of computing task-specific posterior uncertainty. Using a neural network trained this way, we also demonstrate successful breaking of real-world Captchas currently used by Facebook and Wikipedia. Reasoning from these empirical results and drawing connections with Bayesian modeling, we discuss the robustness of synthetic data results and suggest important considerations for ensuring good neural network generalization when training with synthetic data.Comment: 8 pages, 4 figure

    Amortized Rejection Sampling in Universal Probabilistic Programming

    Full text link
    Existing approaches to amortized inference in probabilistic programs with unbounded loops can produce estimators with infinite variance. An instance of this is importance sampling inference in programs that explicitly include rejection sampling as part of the user-programmed generative procedure. In this paper we develop a new and efficient amortized importance sampling estimator. We prove finite variance of our estimator and empirically demonstrate our method's correctness and efficiency compared to existing alternatives on generative programs containing rejection sampling loops and discuss how to implement our method in a generic probabilistic programming framework

    Mask wearing in community settings reduces SARS-CoV-2 transmission

    Get PDF
    The effectiveness of mask wearing at controlling severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission has been unclear. While masks are known to substantially reduce disease transmission in healthcare settings [D. K. Chu et al., Lancet 395, 1973–1987 (2020); J. Howard et al., Proc. Natl. Acad. Sci. U.S.A. 118, e2014564118 (2021); Y. Cheng et al., Science eabg6296 (2021)], studies in community settings report inconsistent results [H. M. Ollila et al., medRxiv (2020); J. Brainard et al., Eurosurveillance 25, 2000725 (2020); T. Jefferson et al., Cochrane Database Syst. Rev. 11, CD006207 (2020)]. Most such studies focus on how masks impact transmission, by analyzing how effective government mask mandates are. However, we find that widespread voluntary mask wearing, and other data limitations, make mandate effectiveness a poor proxy for mask-wearing effectiveness. We directly analyze the effect of mask wearing on SARS-CoV-2 transmission, drawing on several datasets covering 92 regions on six continents, including the largest survey of wearing behavior ([Formula: see text] 20 million) [F. Kreuter et al., https://gisumd.github.io/COVID-19-API-Documentation (2020)]. Using a Bayesian hierarchical model, we estimate the effect of mask wearing on transmission, by linking reported wearing levels to reported cases in each region, while adjusting for mobility and nonpharmaceutical interventions (NPIs), such as bans on large gatherings. Our estimates imply that the mean observed level of mask wearing corresponds to a 19% decrease in the reproduction number R. We also assess the robustness of our results in 60 tests spanning 20 sensitivity analyses. In light of these results, policy makers can effectively reduce transmission by intervening to increase mask wearing

    Efficient Probabilistic Programming Languages

    No full text
    Abstract In recent years, declarative programming languages specialized for probabilistic modeling has emerged as distinct class of languages. These languages are predominantly written by researchers in the machine learning field and concentrate on generalized MCMC inference algorithm. Unfortunately, all these languages are too slow for practical adoption. In my talk, I will outline several places where compiler optimizations could improve these languages and make them more usable in an industrial setting

    Simulation-based inference for global health decisions

    No full text
    The COVID-19 pandemic has highlighted the importance of in-silico epidemiological modelling in predicting the dynamics of infectious diseases to inform health policy and decision makers about suitable prevention and containment strategies. Work in this setting involves solving challenging inference and control problems in individual-based models of ever increasing complexity. Here we discuss recent breakthroughs in machine learning, specifically in simulation-based inference, and explore its potential as a novel venue for model calibration to support the design and evaluation of public health interventions. To further stimulate research, we are developing software interfaces that turn two cornerstone COVID-19 and malaria epidemiology models COVID-sim, (this https URL) and OpenMalaria (https://github.com/mrc-ide/covid-sim/) into probabilistic programs, enabling efficient interpretable Bayesian inference within those simulators
    corecore